R Markdown

This is a tutorial on how to explore and analysis remote sensing (RS) data with R. I’ve just coded in Rmarkdown the material created by Ghosh and Hijmans 2023. Author: Carolina Barbosa

RS products consist of observations of reflectance data. Reflectance is normally measured for different wavelengths of the electromagnetic spectrum. For example, it can be measured in the near-infrared, red, green, and blue wavelengths. If that is the case, satellite data can be referred to as “multi-spectral” (or hyper-spectral if there are many separate wavelengths).

A single “satellite image” has multiple observations for each pixel, that are stored in separate raster layers. These layers (variables) are referred to as “bands” as they typically represent reflectance values for a particular spectral bandwith, and grid cells are referred to as “pixels”.

We’re going to use a spatial subset of a Landsat 8 scene collected on June 14, 2017. The subset covers the area between Concord and Stockton, in California, USA.

dir.create("data", showWarnings = FALSE)
if (!file.exists("data/rs/samples.rds")) {
  download.file("https://biogeo.ucdavis.edu/data/rspatial/rs.zip", dest = "data/rs.zip")
  unzip("data/rs.zip", exdir="data")
}

Image properties and exploration

Create SpatRaster objects for single Landsat layers (bands)

library(terra)
## Warning: package 'terra' was built under R version 4.2.2
## terra 1.7.3
# Blue
b2 <- rast('data/rs/LC08_044034_20170614_B2.tif')
# Green
b3 <- rast('data/rs/LC08_044034_20170614_B3.tif')
# Red
b4 <- rast('data/rs/LC08_044034_20170614_B4.tif')
# Near Infrared (NIR)
b5 <- rast('data/rs/LC08_044034_20170614_B5.tif')

Now let’s print the variables to check. You can see the spatial resolution, extent, number of layers, coordinate reference system and more.

b2
## class       : SpatRaster 
## dimensions  : 1245, 1497, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 594090, 639000, 4190190, 4227540  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 10N (EPSG:32610) 
## source      : LC08_044034_20170614_B2.tif 
## name        : LC08_044034_20170614_B2 
## min value   :               0.0748399 
## max value   :               0.7177562
# coordinate reference system (CRS)
crs(b2)
## [1] "PROJCRS[\"WGS 84 / UTM zone 10N\",\n    BASEGEOGCRS[\"WGS 84\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4326]],\n    CONVERSION[\"UTM zone 10N\",\n        METHOD[\"Transverse Mercator\",\n            ID[\"EPSG\",9807]],\n        PARAMETER[\"Latitude of natural origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",-123,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"Scale factor at natural origin\",0.9996,\n            SCALEUNIT[\"unity\",1],\n            ID[\"EPSG\",8805]],\n        PARAMETER[\"False easting\",500000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"(E)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1]],\n        AXIS[\"(N)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1]],\n    USAGE[\n        SCOPE[\"Engineering survey, topographic mapping.\"],\n        AREA[\"Between 126°W and 120°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - British Columbia (BC); Northwest Territories (NWT); Nunavut; Yukon. United States (USA) - Alaska (AK).\"],\n        BBOX[0,-126,84,-120]],\n    ID[\"EPSG\",32610]]"
# Number of cells, rows, columns
ncell(b2)
## [1] 1863765
dim(b2)
## [1] 1245 1497    1
# spatial resolution
res(b2)
## [1] 30 30
# Number of bands
nlyr(b2)
## [1] 1
# Do the bands have the same extent, number of rows and columns, projection, resolution,and origin?
compareGeom(b2,b3)
## [1] TRUE

Let’s create a SpatRaster with multiple layers from the existing SpatRaster (single layer) objects and check the properties of the multi-band image.

s <- c(b5, b4, b3)
s
## class       : SpatRaster 
## dimensions  : 1245, 1497, 3  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 594090, 639000, 4190190, 4227540  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 10N (EPSG:32610) 
## sources     : LC08_044034_20170614_B5.tif  
##               LC08_044034_20170614_B4.tif  
##               LC08_044034_20170614_B3.tif  
## names       : LC08_04403~0170614_B5, LC08_04403~0170614_B4, LC08_04403~0170614_B3 
## min values  :          0.0008457669,            0.02084067,            0.04259216 
## max values  :          1.0124315023,            0.78617686,            0.69246972

We can also create the multi-layer SpatRaster using the filenames. First let’s create a list of raster layers to use.

filenames <- paste0('data/rs/LC08_044034_20170614_B', 1:11, ".tif")
filenames
##  [1] "data/rs/LC08_044034_20170614_B1.tif" 
##  [2] "data/rs/LC08_044034_20170614_B2.tif" 
##  [3] "data/rs/LC08_044034_20170614_B3.tif" 
##  [4] "data/rs/LC08_044034_20170614_B4.tif" 
##  [5] "data/rs/LC08_044034_20170614_B5.tif" 
##  [6] "data/rs/LC08_044034_20170614_B6.tif" 
##  [7] "data/rs/LC08_044034_20170614_B7.tif" 
##  [8] "data/rs/LC08_044034_20170614_B8.tif" 
##  [9] "data/rs/LC08_044034_20170614_B9.tif" 
## [10] "data/rs/LC08_044034_20170614_B10.tif"
## [11] "data/rs/LC08_044034_20170614_B11.tif"
landsat <- rast(filenames)
landsat
## class       : SpatRaster 
## dimensions  : 1245, 1497, 11  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 594090, 639000, 4190190, 4227540  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 10N (EPSG:32610) 
## sources     : LC08_044034_20170614_B1.tif  
##               LC08_044034_20170614_B2.tif  
##               LC08_044034_20170614_B3.tif  
##               ... and 8 more source(s)
## names       : LC08_~14_B1, LC08_~14_B2, LC08_~14_B3, LC08_~14_B4,  LC08_~14_B5,  LC08_~14_B6, ... 
## min values  :  0.09641791,   0.0748399,  0.04259216,  0.02084067, 0.0008457669, -0.007872183, ... 
## max values  :  0.73462820,   0.7177562,  0.69246972,  0.78617686, 1.0124315023,  1.043204546, ...

Above we created a SpatRaster with 11 layers. The layers represent reflection intensity in the following wavelengths: Ultra Blue, Blue, Green, Red, Near Infrared (NIR), Shortwave Infrared (SWIR) 1, Shortwave Infrared (SWIR) 2,Panchromatic, Cirrus, Thermal Infrared (TIRS) 1, Thermal Infrared (TIRS) 2.

We can plot individual layers of a multi-spectral image:

par(mfrow = c(2,2))
plot(b2, main = "Blue", col = gray(0:100 / 100))
plot(b3, main = "Green", col = gray(0:100 / 100))
plot(b4, main = "Red", col = gray(0:100 / 100))
plot(b5, main = "NIR", col = gray(0:100 / 100))

The legends of the maps created above can range between 0 and 1. Notice the difference in shading and range of legends between the different bands. This is because different surface features reflect the incident solar radiation differently. Each layer represent how much incident solar radiation is reflected for a particular wavelength range. For example, vegetation reflects more energy in NIR than other wavelengths and thus appears brighter. In contrast, water absorbs most of the energy in the NIR wavelength and it appears dark. We do not gain that much information from these grey-scale plots; they are often combined into a “composite” to create more interesting plots.

To make a “true (or natural) color” image, that is, something that looks like a normal photograph (vegetation in green, water blue etc), we need bands in the red, green and blue regions. For this Landsat image, band 4 (red), 3 (green), and 2 (blue) can be used. With plotRGB we can combine them into a single composite image. Note that use of strecth = “lin” (otherwise the image will be pitch-dark).

landsatRGB <- c(b4, b3, b2) #RGB
plotRGB(landsatRGB, stretch = "lin") #The true-color composite reveals much more about the landscape than the earlier gray images.

#False color (NIR + red + green) example -> focus on vegetation (red)
landsatFCC <- c(b5, b4, b3)
plotRGB(landsatFCC, stretch="lin")

Subset, rename bands and spatial subset or crop

You can select specific layers (bands) using subset function, or via indexing.

# select first 3 bands only
landsatsub1 <- subset(landsat, 1:3)
# same
landsatsub2 <- landsat[[1:3]]

# Number of bands in the original and new data
nlyr(landsat)
## [1] 11
## [1] 11
nlyr(landsatsub1)
## [1] 3
## [1] 3
nlyr(landsatsub2)
## [1] 3
## [1] 3

#We won’t use the last four bands in landsat. You can remove those by selecting the ones we want.
landsat <- subset(landsat, 1:7)
names(landsat)
## [1] "LC08_044034_20170614_B1" "LC08_044034_20170614_B2"
## [3] "LC08_044034_20170614_B3" "LC08_044034_20170614_B4"
## [5] "LC08_044034_20170614_B5" "LC08_044034_20170614_B6"
## [7] "LC08_044034_20170614_B7"
#For clarity, it is useful to set the names of the bands
names(landsat) <- c('ultra-blue', 'blue', 'green', 'red', 'NIR', 'SWIR1', 'SWIR2')
names(landsat)
## [1] "ultra-blue" "blue"       "green"      "red"        "NIR"       
## [6] "SWIR1"      "SWIR2"

Spatial subsetting can be used to limit analysis to a geographic subset of the image. Spatial subsets can be created with the crop function, using a SpatExtent object, or another spatial object from which an Extent can be extracted.

ext(landsat)
## SpatExtent : 594090, 639000, 4190190, 4227540 (xmin, xmax, ymin, ymax)
## SpatExtent : 594090, 639000, 4190190, 4227540 (xmin, xmax, ymin, ymax)
e <- ext(624387, 635752, 4200047, 4210939)
# crop landsat by the extent
landsatcrop <- crop(landsat, e)

Saving file

writeRaster(landsatcrop, filename="cropped-landsat.tif", overwrite=TRUE)

Extract cell values

The extract function is used to get raster values at the locations of other spatial data. You can use points, lines, polygons or an Extent (rectangle) object. You can also use cell numbers to extract values. When using points, extract returns the values of a SpatRaster object for the cells in which a set of points fall.

# load the polygons with land use land cover information
samp <- readRDS('data/rs/lcsamples.rds')
# generate 50 point samples from the polygons
set.seed(555)
ptsamp <- spatSample(samp, 50, 'regular')
# We use the x-y coordinates to extract the spectral values for the locations
df <- extract(landsat, ptsamp)
# To see some of the reflectance values
head(df)
##   ID ultra-blue       blue      green        red       NIR     SWIR1      SWIR2
## 1  1  0.1152851 0.09450950 0.09054089 0.06332441 0.4896148 0.1536918 0.07551219
## 2  2  0.1158706 0.09381554 0.08429519 0.05584258 0.4386517 0.1711277 0.08882765
## 3  3  0.1151116 0.09260110 0.09442276 0.05775099 0.5586210 0.1705638 0.07067610
## 4  4  0.1100153 0.08713611 0.08986861 0.05057278 0.4605116 0.1765059 0.07436280
## 5  5  0.1109695 0.08767828 0.08394821 0.04983544 0.6520245 0.1484437 0.04799209
## 6  6  0.1070009 0.08368797 0.06970022 0.04556321 0.3409542 0.1594387 0.07312667

Spectral profiles

A plot of the spectrum (all bands) for pixels representing a certain earth surface features (e.g. water) is known as a spectral profile. Such profiles demonstrate the differences in spectral properties of various earth surface features and constitute the basis for image analysis. Spectral values can be extracted from any multispectral data set using extract function. In the below example, we extracted values of Landsat data for the samples. These samples include: cropland, water, fallow, built and open. First we compute the mean reflectance values for each class and each band and then we plot the mean spectra of these features.

ms <- aggregate(df[,-1], list(ptsamp$class), mean)
# instead of the first column, we use row names
rownames(ms) <- ms[,1]
ms <- ms[,-1]
ms
##          ultra-blue       blue      green        red        NIR      SWIR1
## built     0.1715777 0.16020858 0.15984533 0.17228251 0.22624379 0.21830656
## cropland  0.1123755 0.08990475 0.08546264 0.05381490 0.49006296 0.16329528
## fallow    0.1348246 0.11801761 0.10233108 0.10520815 0.15550623 0.23542800
## open      0.1396925 0.13829443 0.15342000 0.20654889 0.34112771 0.35808941
## water     0.1342292 0.11744489 0.10053341 0.07981103 0.04949535 0.03381014
##               SWIR2
## built    0.18381969
## cropland 0.07174958
## fallow   0.21624273
## open     0.21559214
## water    0.02739490
# Create a vector of color for the land cover classes for use in plotting
mycolor <- c('darkred', 'yellow', 'burlywood', 'cyan', 'blue')

#transform ms from a data.frame to a matrix
ms <- as.matrix(ms)

# First create an empty plot
plot(0, ylim=c(0,0.6), xlim = c(1,7), type='n', xlab="Bands", ylab = "Reflectance")
# add the different classes
for (i in 1:nrow(ms)){
lines(ms[i,], type = "l", lwd = 3, lty = 1, col = mycolor[i])
}
# Title
title(main="Spectral Signatures", font.main = 2)
# Legend
legend("topleft", rownames(ms),
cex=0.8, col=mycolor, lty = 1, lwd =3, bty = "n")

We can see that the spectral signatures (profile) shows (dis)similarity in the reflectance of different features on the earth’s surface (or above it). Water shows relatively low reflection in all wavelengths, and built, fallow and open have relatively high reflectance in the longer wavelengts.

Math operations and stats in RS images

Let’s keep working on the same Landsat data

rfiles <- paste0('data/rs/LC08_044034_20170614_B', 1:11, ".tif")
landsat <- rast(rfiles)
landsatRGB <- landsat[[c(4,3,2)]]
landsatFCC <- landsat[[c(5,4,3)]]

Vegetation indices

In the first example we write a custom math function to calculate the Normalized Difference Vegetation Index (NDVI).Let’s define a general function for a ratio based (vegetation) index. In the function below, img is a muti-layer SpatRaster object and i and k are the indices of the layers (layer numbers) used to compute the vegetation index.You will be able to see the variation in greenness from the plot.

vi <- function(img, k, i) {
bk <- img[[k]]
bi <- img[[i]]
vi <- (bk - bi) / (bk + bi)
return(vi)
}

# For Landsat NIR = 5, red = 4.
ndvi <- vi(landsat, 5, 4)
plot(ndvi, col=rev(terrain.colors(10)), main = "NDVI")

Histogram and thresholding

We can explore the distribution of values contained within our raster using hist to produces a histogram. Histograms are often useful in identifying outliers and bad data values in our raster data.

hist(ndvi, main = "NDVI values", xlab = "NDVI", ylab= "Frequency",
col = "wheat", xlim = c(-0.5, 1), breaks = 30, xaxt = "n")
## Warning: [hist] a sample of54% of the cells was used
## Warning: [hist] a sample of54% of the cells was used

axis(side=1, at = seq(-0.6, 1, 0.2), labels = seq(-0.6, 1, 0.2))

We can apply basic rules to get an estimate of spatial extent of different Earth surface features. Note that NDVI values are standardized and ranges between -1 to +1. Higher values indicate more green cover. Cells with NDVI values greater than 0.4 are definitely vegetation. The following operation masks all cells that are perhaps not vegetation (NDVI < 0.4).

veg <- clamp(ndvi, 0.4, values=FALSE)
plot(veg, main='Vegetation')

We can also create classes for different intensity of vegetation.

m <- c(-1,0.25, 0.3, 0.4, 0.5, 1)
vegc <- classify(ndvi, m)
plot(vegc, col = rev(terrain.colors(4)), main = 'NDVI based thresholding')

Principal component analysis (PCA)

Multi-spectral data are sometimes transformed to helps to reduce the dimensionality and noise in the data. The principal components transform is a generic data reduction method that can be used to create a few uncorrelated bands from a larger set of correlated bands. You can calculate the same number of principal components as the number of input bands. The first principal component (PC) explains the largest percentage of variance and other PCs explain additional the variance in decreasing order.

set.seed(1)
sr <- spatSample(landsat, 10000)
plot(sr[,c(4,5)], main = "NIR-Red plot")

#This is known as vegetation and soil-line plot
pca <- prcomp(sr, scale = TRUE)
pca
## Standard deviations (1, .., p=11):
##  [1] 2.53668771 1.40078059 1.08362915 0.92501695 0.54958227 0.41473655
##  [7] 0.27030406 0.12220817 0.08661844 0.04763013 0.03609876
## 
## Rotation (n x k) = (11 x 11):
##                                PC1        PC2         PC3         PC4
## LC08_044034_20170614_B1  0.2937880  0.3642123 -0.28481332 -0.06676012
## LC08_044034_20170614_B2  0.3357803  0.3382035 -0.15857739 -0.03635911
## LC08_044034_20170614_B3  0.3612145  0.2650391  0.07275822  0.04141597
## LC08_044034_20170614_B4  0.3676650  0.1641879  0.10447910  0.03578747
## LC08_044034_20170614_B5  0.1591361 -0.1852694  0.71300561  0.32594984
## LC08_044034_20170614_B6  0.3472817 -0.2134275  0.22853576  0.15831890
## LC08_044034_20170614_B7  0.3499019 -0.2367876 -0.11725588  0.06147126
## LC08_044034_20170614_B8  0.3496278  0.1964745  0.08489385  0.02433066
## LC08_044034_20170614_B9  0.1325628 -0.1117693  0.33293128 -0.92447725
## LC08_044034_20170614_B10 0.2534434 -0.4831768 -0.30276292 -0.02013211
## LC08_044034_20170614_B11 0.2507597 -0.4850418 -0.30570381 -0.02197088
##                                  PC5         PC6         PC7          PC8
## LC08_044034_20170614_B1   0.49633695  0.17556592 -0.23610914  0.219084848
## LC08_044034_20170614_B2   0.22593198  0.09672106  0.06516109  0.191800572
## LC08_044034_20170614_B3   0.06775249 -0.01134076  0.29589321 -0.502516608
## LC08_044034_20170614_B4  -0.33203379 -0.06966965  0.60728435  0.005032259
## LC08_044034_20170614_B5   0.51515049 -0.06796061 -0.07734807 -0.094468368
## LC08_044034_20170614_B6  -0.28840709  0.33603872 -0.01251499  0.639691364
## LC08_044034_20170614_B7  -0.24635064  0.53401272 -0.39446741 -0.489769588
## LC08_044034_20170614_B8  -0.33312920 -0.65599341 -0.53808102  0.022845490
## LC08_044034_20170614_B9   0.04429935  0.04329372 -0.02425017 -0.001980250
## LC08_044034_20170614_B10  0.16429483 -0.24521079  0.13682816  0.057604000
## LC08_044034_20170614_B11  0.19644322 -0.24453814  0.11434312 -0.028609160
##                                    PC9          PC10         PC11
## LC08_044034_20170614_B1  -0.1023702943  0.5365796233  0.127065695
## LC08_044034_20170614_B2  -0.1218742739 -0.7691469170 -0.196283512
## LC08_044034_20170614_B3   0.6640135096  0.0551533999  0.059318851
## LC08_044034_20170614_B4  -0.5347992170  0.2350928333  0.021665820
## LC08_044034_20170614_B5  -0.1995974299 -0.0311387747  0.004029261
## LC08_044034_20170614_B6   0.3828069102  0.0639560393 -0.021372642
## LC08_044034_20170614_B7  -0.2438645765 -0.0551104634  0.011338146
## LC08_044034_20170614_B8   0.0005337710  0.0005066492 -0.001773935
## LC08_044034_20170614_B9  -0.0009025984 -0.0006276713  0.001823281
## LC08_044034_20170614_B10  0.0039273450 -0.1709760476  0.686896213
## LC08_044034_20170614_B11  0.0433142125  0.1576520508 -0.684766019
screeplot(pca)

Let´s use a function to restrict prediction to the first two principal components

pca_predict2 <- function(model, data, ...) {
predict(model, data, ...)[,1:2]
}
pci <- predict(landsat, pca, fun=pca_predict2)
## 
|---------|---------|---------|---------|
=========================================
                                          
plot(pci)

The first principal component highlights the boundaries between different land use classes. it is difficult to understand what the second principal component is highlighting. Lets try thresholding again and plot the results:

# quick look at the histogram of second component
hist <- pci[[2]]
m <- c(-Inf,-3,NA, -3,-2,0, -2,-1,1, -1,0,2, 0,1,3, 1,2,4, 2,6,5, 6,Inf,NA)
rcl <- matrix(m, ncol = 3, byrow = TRUE)
rcl
##      [,1] [,2] [,3]
## [1,] -Inf   -3   NA
## [2,]   -3   -2    0
## [3,]   -2   -1    1
## [4,]   -1    0    2
## [5,]    0    1    3
## [6,]    1    2    4
## [7,]    2    6    5
## [8,]    6  Inf   NA
pcClass <- classify(pci[[2]], rcl)

#plotting
par(mfrow=c(1,2))
plotRGB(landsatFCC, stretch = "lin", main="False Color", mar=c(3.1, 3.1, 2.1, 2.1))
plot(pcClass, main="PCA")